Skip to main content

Need Data Engineers not Data Scientists

  • The difference between a scientist and engineer is prototyping vs productionizing
    • Data scientist - use statistical and ml techniques to build models on top of data
    • Data engineer - use software and processing tools to build pipelines for data
    • ML scientist - use state-of-the-art ML models to prototype ML models
    • ML engineer - use high-level ML framework to train and productionize ML models
  • There are roughly twice as many open engineering roles vs scientist roles
  • With frameworks like Tensorflow and PyTorch, setting up the prototype is getting easier but the "boring" skills like building an ETL pipeline is getting increasingly scarce